Experiments with Scheduling Using Simulated Annealing in a Grid Environment

نویسندگان

  • Asim YarKhan
  • Jack J. Dongarra
چکیده

Generating high quality schedules for distributed applications on a Computational Grid is a challenging problem. Some experiments using Simulated Annealing as a scheduling mechanism for a ScaLAPACK LU solver on a Grid are described. The Simulated Annealing scheduler is compared to a Ad-Hoc Greedy scheduler used in earlier experiments. The Simulated Annealing scheduler exposes some assumptions built into the Ad-Hoc scheduler and some problems with the Performance Model being used. 1 Scheduling in the GrADS Project Despite of the existence of several Grid infrastructure projects such as Globus [9] and Legion [11], programming, executing and monitoring applications on a Computational Grid remains a user intensive process. The goal of the Grid Application Development Software (GrADS) [4] project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing. The GrADS project intends to provide tools and technologies for the development and execution of applications in a Grid environment. This includes tasks such as locating available resources on the Grid and scheduling an application on an appropriate subset of the resources. This paper will present some experiments on automated scheduling in a Grid environment. The scheduling is done over a non-homogeneous set of Grid resources residing at geographically disparate locations, and uses dynamic machine status and connectivity information from the Globus Metacomputing Directory Service (MDS) [9, 10] and the Network Weather System (NWS) [19]. The naive approach of testing all possible machine schedules to select the best schedule quickly becomes intractable as the number of machines grows. When N machines are available, the naive approach would require checking approximately 2N possible subsets of machines. This minimum execution-time multiprocessor scheduling problem is known to be NP-hard in its generalized form, and is NP-hard even in some restricted forms [16]. Many heuristics exist that can be used to reduce the search space, and search strategies such as greedy searches (which rank order the machines using some criteria), and (non)-linear programming searches (which seek to minimize an objective function given certain constraints) can be used to find solutions. However, these techniques generally do not contain mechanisms to avoid local minima. There are many research efforts aimed at scheduling strategies for the Grid [5, 17, 18, 1, 2], see Berman [3] for an overview of scheduling on the Grid and a summary of alternative approaches. Berman argues that a successful scheduling strategy for the Grid has to produce time-frame specific performance predictions, has to use dynamic information, and has to adapt to a variety of potential computational environments. Scheduling in the GrADS project takes dynamic resource information about a distributed, heterogeneous Grid environment, and tries to generate a schedule to minimize the execution time. ∗This work is supported in part by the National Science Foundation contract GRANT #E81-9975020, SC R36505-29200099, R011030-09, “Next Generation Software: Grid Application Development Software (GrADS)”. 1.1 Numerical Libraries and the Grid As part of an earlier GrADS project demonstration [14], a ScaLAPACK [6] numerical solver routine (i.e., the LU solver routine PDGESV) was analyzed to obtain an accurate Performance Model for the routine. This Performance Model is used to predict the execution time for the routine given the current machine characteristics (i.e., CPU load, free memory) and their current connection characteristics (i.e., bandwidth and latency). The information from the Performance Model can be used to schedule the routine on a subset of the available resources to execute in the minimum time. Scheduling the LU solver routine is somewhat complicated by the fact that the minimum per-machine memory requirements change as the number of machines chosen varies. This means, if the selection or removal of a machine changes the number of chosen machines, all the other currently selected machines may need to be reevaluated. An Ad-Hoc greedy approach was used for scheduling in the earlier GrADS project demonstration [14] (a slightly modified version of this scheduler is described in this document). Experiments in a simplistic, homogeneous, single cluster environment have shown that this scheduler can make better than 95% accurate predictions of the execution time. 1.2 Ad-Hoc Greedy Scheduler Used in the ScaLAPACK Experiment The scheduling algorithm used in the ScaLAPACK LU solver demonstration [14] uses an Ad-Hoc greedy technique in conjunction with a hand-crafted Performance Model to select the machines on which to schedule the execution. The list of all qualified, currently available machines is obtained from the Globus Metacomputing Directory Service (MDS); it may contain machines from several geographically distributed clusters. The Network Weather Service (NWS) is contacted to obtain details pertaining to each machine (i.e., the CPU load, the available memory) and the latency and bandwidth between machines. This detailed resource information is used by the Performance Model to estimate the execution time for the ScaLAPACK routine. The Ad-Hoc scheduling algorithm can be approximated as in Algorithm 1. Algorithm 1: Scheduling using Ad-Hoc greedy scheduler 1: for each cluster, starting a new search in the cluster do 2: select fastest machine in the cluster to initialize 3: repeat 4: find a new machine which has highest average bandwidth to the machines that are already selected and add it to the selected machines 5: ensure that memory constraints are met by all machines (details omitted here) 6: use the Performance Model with detailed machine and network information to predict the execution time 7: until the Performance Model shows that execution time is no longer decreasing 8: track the best solution 9: end for In this algorithm, new machines are ordered by their average bandwidth with the machines that have already been selected and they are added to the selection in this order. This technique returns a good set of machines for the application, but it assumes that communication is the major factor determining the execution time of the algorithm. Other greedy techniques using different orderings have been implemented within the GrADS resource selection process, for example, using CPU load to order the machines [8]. 2 Global Scheduling Strategies The scheduling problem can be viewed as an multivariate optimization problem, where the application is being assigned to a set of machines so as to optimize some metric (i.e., the overall execution time). Techniques exist for finding locally optimal solutions for multivariate optimization problems, such as gradient descent techniques, linear programming, or greedy techniques. However, these techniques only search some local space, and will not find the global optimum if it is not contained in that local space. For example, the Ad-Hoc greedy method orders the machines by communication, and thus will not find the optimal solution if it is not contained in that ordering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Integrated Model of Project Scheduling and Material Ordering: A Hybrid Simulated Annealing and Genetic Algorithm

This study aims to deal with a more realistic combined problem of project scheduling and material ordering. The goal is to minimize the total material holding and ordering costs by determining the starting time of activities along with material ordering schedules subject to some constraints. The problem is first mathematically modelled. Then a hybrid simulated annealing and genetic algorithm is...

متن کامل

Quay Cranes and Yard Trucks Scheduling Problem at Container Terminals

A bi-objective mathematical model is developed to simultaneously consider the quay crane and yard truck scheduling problems at container terminals. Main real-world assumptions, such as quay cranes with non-crossing constraints, quay cranes’ safety margins and precedence constraints are considered in this model. This integrated approach leads to better efficiency and productivity at container te...

متن کامل

Task Scheduling in Grid Environment Using Simulated Annealing and Genetic Algorithm

Grid computing enables access to geographically and administratively dispersed networked resources and delivers functionality of those resources to individual users. Grid computing systems are about sharing computational resources, software and data at a large scale. The main issue in grid system is to achieve high performance of grid resources. It requires techniques to efficiently and adaptiv...

متن کامل

A Complex Network-Based Approach for Job Scheduling in Grid Environments

Many optimization techniques have been adopted for efficient job scheduling in grid computing, such as: genetic algorithms, simulated annealing and stochastic methods. Such techniques present common problems related to the use of inaccurate and out-of-date information, which degrade the global system performance. Besides that, they also do not properly model a grid environment. In order to adeq...

متن کامل

Beyond Simulated Annealing in Grid Scheduling

In Grid Environment the number of resources and tasks to be scheduled is usually variable and dynamic in nature. This characteristic emphasizes the scheduling approach as a complex optimization problem. Scheduling is a key issue which must be solved in grid computing study and a better scheduling scheme can greatly improve the efficiency.The objective of this paper is to explore and investigate...

متن کامل

َA Multi-objective simulated annealing algorithm to solving flexible no-wait flowshop scheduling problems with transportation times

This paper deals with a bi-objective hybrid no-wait flowshop scheduling problem minimizing the makespan and total weighted tardiness, in which we consider transportation times between stages. Obtaining an optimal solution for this type of complex, large-sized problem in reasonable computational time by using traditional approaches and optimization tools is extremely difficult. This paper presen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002